The Global Terrorism Database (GTD) is the most comprehensive unclassified database of terrorist attacks in the world. The National Consortium for the Study of Terrorism and Responses to Terrorism (START) makes the GTD available via this site in an effort to improve understanding of terrorist violence, so that it can be more readily studied and defeated. The GTD is produced by a dedicated team of researchers and technical staff.
The GTD is an open-source database, which provides information on domestic and international terrorist attacks around the world since 1970, and now includes more than 200,000 events. For each event, a wide range of information is available, including the date and location of the incident, the weapons used, nature of the target, the number of casualties, and – when identifiable – the group or individual responsible. Link of the dataset: https://www.start.umd.edu/gtd/access/
library(tidyverse)
library(data.table)
library(lubridate)
library(RColorBrewer)
library(gridExtra)
library(plotly)
library(ggthemes)
library(wesanderson)
library(leaflet)
library(VIM)
dt <- as.tibble(fread("globalterrorismdb_0221dist.csv",
na.strings = c("", "NA")))
dt
## # A tibble: 201,183 x 135
## eventid iyear imonth iday approxdate extended resolution country country_txt
## <int64> <int> <int> <int> <chr> <int> <chr> <int> <chr>
## 1 197000000001 1970 7 2 <NA> 0 <NA> 58 Dominican ~
## 2 197000000002 1970 0 0 <NA> 0 <NA> 130 Mexico
## 3 197001000001 1970 1 0 <NA> 0 <NA> 160 Philippines
## 4 197001000002 1970 1 0 <NA> 0 <NA> 78 Greece
## 5 197001000003 1970 1 0 <NA> 0 <NA> 101 Japan
## 6 197001010002 1970 1 1 <NA> 0 <NA> 217 United Sta~
## 7 197001020001 1970 1 2 <NA> 0 <NA> 218 Uruguay
## 8 197001020002 1970 1 2 <NA> 0 <NA> 217 United Sta~
## 9 197001020003 1970 1 2 <NA> 0 <NA> 217 United Sta~
## 10 197001030001 1970 1 3 <NA> 0 <NA> 217 United Sta~
## # ... with 201,173 more rows, and 126 more variables: region <int>,
## # region_txt <chr>, provstate <chr>, city <chr>, latitude <dbl>,
## # longitude <dbl>, specificity <int>, vicinity <int>, location <chr>,
## # summary <chr>, crit1 <int>, crit2 <int>, crit3 <int>, doubtterr <int>,
## # alternative <int>, alternative_txt <chr>, multiple <int>, success <int>,
## # suicide <int>, attacktype1 <int>, attacktype1_txt <chr>, attacktype2 <int>,
## # attacktype2_txt <chr>, attacktype3 <int>, attacktype3_txt <chr>, ...
##There are 135 variables in the original data.We’ll select variables that are relatively easy to interpret and have less missing values: year, month, location, number of kill, ransom, suicide…
There are 135 variables in the original data.
gbtr <- select(dt, c(1,2,3,4,9,11,12,13,14,15,18,27,28,59,99,113,117))
gbtr$imonth[gbtr$imonth==0] <- NA
gbtr$iday[gbtr$iday==0] <- NA
gbtr2k <- gbtr %>% filter(iyear>=2000)
gbtr2k$imonth[gbtr2k$imonth==0] <- NA
gbtr2k$iday[gbtr2k$iday==0] <- NA
glimpse(gbtr)
## Rows: 201,183
## Columns: 17
## $ eventid <int64> 197000000001, 197000000002, 197001000001, 197001000002, ~
## $ iyear <int> 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970, 1970~
## $ imonth <int> 7, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ iday <int> 2, NA, NA, NA, NA, 1, 2, 2, 2, 3, 1, 6, 8, 9, 9, 10, 11, 1~
## $ country_txt <chr> "Dominican Republic", "Mexico", "Philippines", "Greece", "~
## $ region_txt <chr> "Central America & Caribbean", "North America", "Southeast~
## $ provstate <chr> "National", "Federal", "Tarlac", "Attica", "Fukouka", "Ill~
## $ city <chr> "Santo Domingo", "Mexico city", "Unknown", "Athens", "Fuko~
## $ latitude <dbl> 18.45679, 19.37189, 15.47860, 37.99749, 33.58041, 37.00511~
## $ longitude <dbl> -69.95116, -99.08662, 120.59974, 23.76273, 130.39636, -89.~
## $ location <chr> NA, NA, NA, NA, NA, NA, NA, "Edes Substation", NA, NA, NA,~
## $ success <int> 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ suicide <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
## $ gname <chr> "MANO-D", "23rd of September Communist League", "Unknown",~
## $ nkill <int> 1, 0, 1, NA, NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NA, 1, 0, 0~
## $ nhours <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ ransom <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
head(gbtr)
## # A tibble: 6 x 17
## eventid iyear imonth iday country_txt region_txt provstate city latitude
## <int64> <int> <int> <int> <chr> <chr> <chr> <chr> <dbl>
## 1 197000000001 1970 7 2 Dominican ~ Central A~ National Sant~ 18.5
## 2 197000000002 1970 NA NA Mexico North Ame~ Federal Mexi~ 19.4
## 3 197001000001 1970 1 NA Philippines Southeast~ Tarlac Unkn~ 15.5
## 4 197001000002 1970 1 NA Greece Western E~ Attica Athe~ 38.0
## 5 197001000003 1970 1 NA Japan East Asia Fukouka Fuko~ 33.6
## 6 197001010002 1970 1 1 United Sta~ North Ame~ Illinois Cairo 37.0
## # ... with 8 more variables: longitude <dbl>, location <chr>, success <int>,
## # suicide <int>, gname <chr>, nkill <int>, nhours <dbl>, ransom <int>
matrixplot(gbtr, sortby = c("nkill"))
aggr(gbtr, labels=names(gbtr),cex.axis = .9)
##Variables such as location, nhours, and ransom has large number of missing valid entries/values.EDA with these variables will be avoided for further reduction of complexity.
Variables such as location, nhours, and ransom has large number of missing values. EDA with thses variables will be avoided.
p <- gbtr %>% mutate(iyear=as.factor(iyear)) %>%
group_by(iyear) %>% count() %>%
ggplot(aes(x=iyear,y=n,group=1)) +
geom_line(size=1, color="brown")+
geom_point(color="brown") +
scale_x_discrete(
breaks=c("1970", "2000","2008", "2011", "2014","2017")
) +
labs(title = "Event by year", x = "year", y = "count")+
theme_economist()
p
There is a rapid increase in terrorist event since year 2000. We’ll seperately observe the trend by the region.
p4 <- gbtr %>% count(region_txt, iyear) %>%
ggplot(aes(iyear, n,color=region_txt)) +
geom_line(aes(group=region_txt)) +
labs(title = "Trend by Region", x="year", y="count", color="region")+
theme_light()
ggplotly(p4)
Hovering over the plot to see region label Middle East & North Africa and South Asia are the regions mainly responsible for the spike in data.
Since there is a steep upward trend since aproximately year 2000, we’ll inspect the period before and after 2000 seperately.
p2 <- gbtr %>% mutate(pd=ifelse(iyear<2000,"before 2000", "after 2000")) %>%
mutate(pd = factor(pd, levels = c("before 2000", "after 2000")))%>%
group_by(region_txt, pd) %>% count() %>%
ggplot(aes(x=reorder(region_txt, n), y=n))+
geom_bar(aes( fill=pd), stat= "identity", position = "dodge")+
labs(title = "Events by region", x = "region", y = "count", fill = "period")+
theme_economist()+
scale_fill_manual(values = c("#66b2b2","#006666")) +
coord_flip()
p2
The region with the most terrorist attack bacame “Middle East & North Africa” after 2000. (“South America” before 2000).
“South Asia” saw the largest increase in terrorism since the 70s.
pkr <- gbtr2k %>% filter(!is.na(nkill)) %>% group_by(region_txt) %>%
summarise(ksum=sum(nkill)) %>%
ggplot(aes(reorder(region_txt,ksum), ksum))+
geom_bar(stat = "identity", fill="#2E8B57")+
coord_flip()+
labs(title = "Num. of kills by region", subtitle = "without missing values, after 2000", x="region", y="count")+
theme_economist()
per <- gbtr2k %>% group_by(region_txt) %>% count() %>% top_n(10,n) %>%
ggplot(aes(x=reorder(region_txt, n), y=n))+
geom_bar(stat= "identity", fill="#006666")+
labs(title = "Events by region",subtitle = "after 2000", x = "region", y = "count")+
theme_economist()+
coord_flip()
grid.arrange(pkr,per,ncol=2)
We’ll look at data after year 2000
pec <- gbtr2k %>% group_by(country_txt) %>% count() %>% ungroup() %>%
top_n(n=20,wt = n) %>%
ggplot(aes(reorder(country_txt, n), n))+
geom_bar(stat = "identity", fill="#21618C") +
labs(title = "Event by country", subtitle = "after 2000", x = "Country", y = "Count") +
theme_economist() +
scale_fill_manual(values = wes_palette(n=4,"Cavalcanti1"))+
coord_flip()
pec
dtscd <- gbtr2k %>% filter(!is.na(suicide)) %>% group_by(region_txt, suicide) %>% count() %>%
ungroup() %>% group_by(region_txt) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1)
ggplot(dtscd, aes(reorder(region_txt, pct), pct*100)) +
geom_bar(stat = "identity", fill="#5D6D7E")+
coord_flip()+
labs(title = "Pct of suicide attack by region", subtitle = "after 2000", x="region",y="%")+
theme_economist()
gbtr %>%filter(gname!="Unknown") %>% group_by(gname,suicide) %>% summarise(n=n()) %>%
ungroup() %>% group_by(gname) %>% mutate(sum=sum(n)) %>% ungroup() %>% top_n(30,sum) %>%
ggplot(aes(x=reorder(gname,sum),n, fill=factor(suicide, levels = c(1,0)))) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Groups and attacks", x="groups", y="attacks", fill="suicide") +
theme_economist_white() +
scale_fill_manual(values = wes_palette(n=2, "Cavalcanti1"))
## `summarise()` has grouped output by 'gname'. You can override using the `.groups` argument.
Disregarding the “Unknown” groups
wp <- dt %>% select(1,2,3,4,9,11,13,14,15,27,28,30,59,83,85,99,102,117)
wp$imonth[wp$imonth==0] <- NA
wp$iday[wp$iday==0] <- NA
patkrg<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>%
ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
geom_bar(stat = "identity",position = "stack")+
scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
labs(title = "Attack type by region",
x="region", y="num.", fill="attack type")
patkrg2<- wp %>% group_by(region_txt, attacktype1_txt) %>% count() %>%
ggplot(aes(region_txt, n, fill=attacktype1_txt)) +
geom_bar(stat = "identity",position = "fill")+
scale_fill_manual(values = wes_palette("Darjeeling1" ,n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 0.8))+
labs(title = "Attack type by region",
x="region", y="num.", fill="attack type")
patkrg
patkrg2
Different groups might prefer different types of attack method. There are 3671 groups in the data. We’ll look at the groups with the most attacks.
wp %>% filter(gname %in% grp$gname)%>%
group_by(gname, attacktype1_txt) %>% count() %>%
ggplot(aes(gname, n, fill= attacktype1_txt))+
geom_bar(stat = "identity",position = "stack")+
scale_fill_manual(values = wes_palette("Darjeeling1",n=9, type="continuous"))+
theme_economist()+
scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 5))+
labs(title = "Attack type by groups", subtitle = "Groups with the most attacks",
x="groups", y="pct", fill="attack type")
wp %>% filter(attacktype1_txt=="Bombing/Explosion" & gname %in% grp$gname ) %>%
group_by(gname, suicide) %>% count() %>% ungroup() %>% group_by(gname) %>% mutate(pct=n/sum(n)) %>% filter(suicide==1) %>% arrange(desc(pct))
## # A tibble: 6 x 4
## # Groups: gname [6]
## gname suicide n pct
## <chr> <int> <int> <dbl>
## 1 Boko Haram 1 510 0.527
## 2 Islamic State of Iraq and the Levant (ISIL) 1 1369 0.317
## 3 Taliban 1 727 0.216
## 4 Al-Shabaab 1 192 0.125
## 5 Kurdistan Workers' Party (PKK) 1 27 0.0310
## 6 Houthi extremists (Ansar Allah) 1 2 0.00142
wp %>% filter(!is.na(nkill)&attacktype1_txt!="Unknown") %>%
group_by(region_txt,attacktype1_txt) %>%
summarise(sumk=sum(nkill), event=n(), kperattack=sum(nkill)/n()) %>%
ggplot(aes(reorder(attacktype1_txt, kperattack), kperattack))+
geom_bar(aes(fill=region_txt), stat = "identity")+
coord_flip()+
facet_wrap(.~ region_txt, ncol = 4, scales = "free_x")+
labs(title = "num. of death by attack type and region", x="attack type", y="death per event")+
scale_fill_manual(values = wes_palette("Darjeeling1", n=12, type = "continuous"))+
theme(legend.position = "none")
## `summarise()` has grouped output by 'region_txt'. You can override using the `.groups` argument.
Types of attack that cause the most death/attack is drastically different from region to region.
Bombing (to my surprise) isn’t responsible for the most death/attack. Instead it’s armed assault and hostage taking in most region.
Hostage taking has the most death/attack in East Asia, Eastern Europe, Middle East & North Africa, South Asia, Southeast Asia, Sub-Saharan Africa and Western Europe.
North America’s extreme data reflects 9/11 attacks on 2001, with nearly 3,000 recorded deaths in 4 attacks.